NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

EAGLE : Exploring the Design Space for Multi-modal LLMs with Mixture of Encoders

Shi, Min; Liu, Fuxiao; Wang, Shihao; Liao, Shijia; Radhakrishnan, Subhashree; Zhao, Yilin; Huang, De-an; Yin, Hongxu; Sapra, Karan; Yccoob, Yaser; et al (April 2025, ICLR 2025)

The ability to accurately interpret complex visual information is a crucial topic of multimodal large language models (MLLMs). Recent work indicates that enhanced visual perception significantly reduces hallucinations and improves performance on resolution-sensitive tasks, such as optical character recognition and document analysis. A number of recent MLLMs achieve this goal using a mixture of vision encoders. Despite their success, there is a lack of systematic comparisons and detailed ablation studies addressing critical aspects, such as expert selection and the integration of multiple vision experts. This study provides an extensive exploration of the design space for MLLMs using a mixture of vision encoders and resolutions. Our findings reveal several underlying principles common to various existing strategies, leading to a streamlined yet effective design approach. We discover that simply concatenating visual tokens from a set of complementary vision encoders is as effective as more complex mixing architectures or strategies. We additionally introduce Pre-Alignment to bridge the gap between vision-focused encoders and language tokens, enhancing model coherence. The resulting family of MLLMs, Eagle, surpasses other leading open-source models on major MLLM benchmarks.
more » « less
Free, publicly-accessible full text available April 24, 2026
The Frontiers of Smart Healthcare Systems

https://doi.org/10.3390/healthcare12232330

Lin, Nan; Paul, Rudy; Guerra, Santiago; Liu, Yan; Doulgeris, James; Shi, Min; Lin, Maohua; Engeberg, Erik D; Hashemi, Javad; Vrionis, Frank D (December 2024, Healthcare)

Artificial Intelligence (AI) is poised to revolutionize numerous aspects of human life, with healthcare among the most critical fields set to benefit from this transformation. Medicine remains one of the most challenging, expensive, and impactful sectors, with challenges such as information retrieval, data organization, diagnostic accuracy, and cost reduction. AI is uniquely suited to address these challenges, ultimately improving the quality of life and reducing healthcare costs for patients worldwide. Despite its potential, the adoption of AI in healthcare has been slower compared to other industries, highlighting the need to understand the specific obstacles hindering its progress. This review identifies the current shortcomings of AI in healthcare and explores its possibilities, realities, and frontiers to provide a roadmap for future advancements.
more » « less
Full Text Available
Genetic-GNN: Evolutionary architecture search for Graph Neural Networks

https://doi.org/10.1016/j.knosys.2022.108752

Shi, Min; Tang, Yufei; Zhu, Xingquan; Huang, Yu; Wilson, David; Zhuang, Yuan; Liu, Jianxun (July 2022, Knowledge-Based Systems)

Full Text Available
Context-Sensitive and Directional Concurrency Fuzzing for Data-Race Detection

https://doi.org/10.14722/ndss.2022.24296

Jiang, Zu-Ming; Bai, Jia-Ju; Lu, Kangjie; Hu, Shi-Min (February 2022, Network and Distributed Systems Security (NDSS) Symposium 2022)

Full Text Available
Feature-Attention Graph Convolutional Networks for Noise Resilient Learning

https://doi.org/10.1109/TCYB.2022.3143798

Shi, Min; Tang, Yufei; Zhu, Xingquan; Zhuang, Yuan; Lin, Maohua; Liu, Jianxun (January 2022, IEEE Transactions on Cybernetics)

Noise and inconsistency commonly exist in real-world information networks, due to the inherent error-prone nature of human or user privacy concerns. To date, tremendous efforts have been made to advance feature learning from networks, including the most recent graph convolutional networks (GCNs) or attention GCN, by integrating node content and topology structures. However, all existing methods consider networks as error-free sources and treat feature content in each node as independent and equally important to model node relations. Noisy node content, combined with sparse features, provides essential challenges for existing methods to be used in real-world noisy networks. In this article, we propose feature-based attention GCN (FA-GCN), a feature-attention graph convolution learning framework, to handle networks with noisy and sparse node content. To tackle noise and sparse content in each node, FA-GCN first employs a long short-term memory (LSTM) network to learn dense representation for each node feature. To model interactions between neighboring nodes, a feature-attention mechanism is introduced to allow neighboring nodes to learn and vary feature importance, with respect to their connections. By using a spectral-based graph convolution aggregation process, each node is allowed to concentrate more on the most determining neighborhood features aligned with the corresponding learning task. Experiments and validations, w.r.t. different noise levels, demonstrate that FA-GCN achieves better performance than the state-of-the-art methods in both noise-free and noisy network environments.
more » « less
Full Text Available
Static Detection of Unsafe DMA Accesses in Device Drivers

Bai, Jia-Ju; Li, Tuo; Lu, Kangjie; Hu, Shi-Min (August 2021, the 30th USENIX Security Symposium (Security'21))
null (Ed.)
Full Text Available
Overcoming resolution attenuation during tilted cryo-EM data collection

https://doi.org/10.1038/s41467-023-44555-7

Aiyer, Sriram; Baldwin, Philip R.; Tan, Shi Min; Shan, Zelin; Oh, Juntaek; Mehrani, Atousa; Bowman, Marianne E.; Louie, Gordon; Passos, Dario Oliveira; Đorđević-Marquardt, Selena; et al (January 2024, Nature Communications)

Abstract Structural biology efforts using cryogenic electron microscopy are frequently stifled by specimens adopting “preferred orientations” on grids, leading to anisotropic map resolution and impeding structure determination. Tilting the specimen stage during data collection is a generalizable solution but has historically led to substantial resolution attenuation. Here, we develop updated data collection and image processing workflows and demonstrate, using multiple specimens, that resolution attenuation is negligible or significantly reduced across tilt angles. Reconstructions with and without the stage tilted as high as 60° are virtually indistinguishable. These strategies allowed the reconstruction to 3 Å resolution of a bacterial RNA polymerase with preferred orientation, containing an unnatural nucleotide for studying novel base pair recognition. Furthermore, we present a quantitative framework that allows cryo-EM practitioners to define an optimal tilt angle during data acquisition. These results reinforce the utility of employing stage tilt for data collection and provide quantitative metrics to obtain isotropic maps.
more » « less
GAEN: Graph Attention Evolving Networks

https://doi.org/10.24963/ijcai.2021/213

Shi, Min; Huang, Yu; Zhu, Xingquan; Tang, Yufei; Zhuang, Yuan; Liu, Jianxun (August 2021, Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI))

Real-world networked systems often show dynamic properties with continuously evolving network nodes and topology over time. When learning from dynamic networks, it is beneficial to correlate all temporal networks to fully capture the similarity/relevance between nodes. Recent work for dynamic network representation learning typically trains each single network independently and imposes relevance regularization on the network learning at different time steps. Such a snapshot scheme fails to leverage topology similarity between temporal networks for progressive training. In addition to the static node relationships within each network, nodes could show similar variation patterns (e.g., change of local structures) within the temporal network sequence. Both static node structures and temporal variation patterns can be combined to better characterize node affinities for unified embedding learning. In this paper, we propose Graph Attention Evolving Networks (GAEN) for dynamic network embedding with preserved similarities between nodes derived from their temporal variation patterns. Instead of training graph attention weights for each network independently, we allow model weights to share and evolve across all temporal networks based on their respective topology discrepancies. Experiments and validations, on four real-world dynamic graphs, demonstrate that GAEN outperforms the state-of-the-art in both link prediction and node classification tasks.
more » « less
Full Text Available
Topology and Content Co-Alignment Graph Convolutional Learning

https://doi.org/10.1109/TNNLS.2021.3084125

Shi, Min; Tang, Yufei; Zhu, Xingquan (January 2021, IEEE Transactions on Neural Networks and Learning Systems)
null (Ed.)
Full Text Available
Multi-Label Graph Convolutional Network Representation Learning

https://doi.org/10.1109/TBDATA.2020.3019478

Shi, Min; Tang, Yufei; Zhu, Xingquan; Liu, Jianxun (January 2021, IEEE Transactions on Big Data)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records